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Abstract 

The method of surrogate data is a tool to test whether data were gen- 
erated by some class of model. Tests based on the periodogram have been 
proposed to decide if linear systems driven by Gaussian noise could have 
generated a sample time series. We show that this procedure based on the 
periodogram, in general, misspecifies the test statistic. Based on the theory 
of linear systems we suggest an alternative procedure to obtain the correct 
distribution of the test statistic and discuss problems of this approach. 
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1 Introduction 



The method of surro Excite dcitci was proposed by Theiler et al. general 
procedure to test whether data are consistent with some class of models. It 
has been used to test a proposed nonlinear dynamics in geophysics certain 
properties of neural spike dynamics |§, but the most common application is 
testing for linear models [|, |], |7|, [8], |[ [lOfl . In order to test the hypothesis that 
the data are consistent with being generated by a linear system, the FT algorithm 
is applied. Based on an example and the theory of linear stochastic systems we 
will show that this algorithm does not produce the correct distribution of time 
series and therefore might not generally yield the correct distribution of the test 
statistic. 

The surrogate data method is similar to a Monte Carlo implementation of 

a standard hypothesis test in which one specifies a procedure for generating an 

ensemble of new time series with a distribution that matches a model to be 

tested. One generates several sample time series, the surrogate data. Next, a 

selected feature function / is evaluated for the original time series x and for each 

of the surrogates. If the distribution of the feature values is well approximated 

by a Gaussian it can be characterized by its mean /x surr and standard deviation 

°"surr- If /( x )> the feature value calculated for the measured time series, is more 

than a few standard deviations away from /x surr , the null hypothesis that the 

original data were generated by the tested model is rejected, and the significance 

is reported as the distance z = l^surr-ZWI . If the distribution of the feature is 
r CT surr 

not well approximated by a Gaussian distribution, nonparametric rank statistics 
should be used. 

The surrogate data method differs from a simple Monte Carlo implementation 
of a hypothesis test in that it tests not against a single model, but a class of 
models, i.e. linear systems driven by Gaussian noise. The idea is to select a 
single model from the class on the basis of the measured data x, and then do a 
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Monte Carlo hypothesis test for the selected model and the original data. 

The statistical properties of a time series generated by a linear process are 
specified by the autocovariance function (ACF) or equivalently by its Fourier 
transform, the power spectrum S(u). Theiler et al. use the periodogram Per (a;), 
i.e., the squared modulus of the Fourier transform of the data, to specify the 
linear model to be testecf]. They generate surrogates (FT algorithm) by drawing 



complex numbers with amplitude sj Per(u;) and random phases and Fourier trans- 
forming back into the time domain. There are two problems in this procedure: 
First, the periodogram Per(cu) is not a consistent estimator of the power spec- 
trum S(u), and second, the distribution of surrogates generated by randomizing 
the phases but not the amplitudes does not correspond to any linear stochastic 
system at all. The consequences of these two points on the estimated distribution 
of the feature, in case of Gaussianity /i SU rr and a SU rr, are the primary focus of 
this paper. 

In the next section we briefly summarize the main ideas of surrogate data 
testing for linearity based on the periodogram and demonstrate by a simple ex- 
ample that this procedure might misspecificy the distribution of the test statistic. 
To explain this result we refer to some results of the theory of linear systems in 
section | and discuss the consequences for surrogate data testing for linearity in 



2 Periodogram Based Surrogates 

The key to a hypothesis test is the distribution of a test statistic / when the null 

hypothesis H is true, i.e. p(f\H ). Given a measured sample x, one calculates 

/(x) and decides if it is reasonable to suppose that /(x) was drawn from p(f\H ), 

in general by an a priori choosen significance level. 

*An invertible nonlinearity in the observation function should be rectified by a change of 
coordinates that forces the distribution of the data to be Gaussian lOl . 




section B. 
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The surrogate data method was proposed by Theiler et al. primarily to test 
the null hypothesis that the data come from a linear Gaussian process. Once 
again, the key to designing a hypothesis test is the identification of the distri- 
bution of a test statistic p(f\H ). For a class as large and complex as all linear 
processes, it seems impossible. The idea of Theiler et al. is to use the observation 
itself to limit the size of the whole class to a single member. 

Surrogate data testing needs two ingredients: A procedure to generate data 
from the class of models to be tested and a feature or statistic to perform the 
testing. If a single model is given, the first step is simple. One generates a large 
number of time series and approximates the distribution p(f\H ). For instance, 
see the paper by Witt et al. [0] in which a nonlinear difference equation proposed 
to describe the dynamics of the reversals of the earth's magnetic field was rejected 
using a feature that describes the transition probabilities. 

Theiler et al. proposed their FT algorithm to determine if a given time series 
could have been produced by a linear system driven by Gaussian noise. The basic 
idea is that the statistical properties of a linear process are captured completely 
by its spectral properties. Therefore, they first calculate the periodogram Per (a;), 
i.e. the squared modulus of the Fourier transform, for the positive frequencies. In 
order to generate the surrogate data they draw complex numbers with amplitude 



Per(a;) and random phases, symmetrize the negative frequencies with x{— uj) = 



x{lo)* to obtain a real time series by Fourier transforming back into the time 
domain. 

We now give a simple example that demonstrates that the FT algorithm, 
might not reproduce the parent distribution of a feature for a linear process. We 
used a simple Gaussian white noise process with mean zero and unit variance 
and evaluated the simple feature: 





N-l 



(1) 



Please 



locate fig.l 
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near here 



for new realizations of the process and for ensembles generated by the FT 
algorithm. These ensembles were based on different realizations of the process. 
The results were independent from the number of data points N used. We only 
show the results for iV = 16384 in fig. |l|. To exclude any variance dependent 
effect, all time series are normalized to mean zero and variance 1. The lowest line 
displays the cumulative distribution of the feature from 100 new realizations of 
the process. The lines above show the results for 100 realizations based on the 
FT algorithm. The FT algorithm fails to recover the parent distribution. The 
surrogate distributions p(/|surr) are grouped too tightly around the value of the 
test statistic /(x) (marked by O) for that time series on which the surrogates 
were based. 

The Kolmogorov- Smirnov- test for Gaussianity does not reject the consis- 
tency with a normal distribution at the 99% level of confidence for the shown 
distributions. Thus, they may be described by mean and standard deviation. 
The standard deviation of the true distribution is underestimated by a factor of 
two. The Kolmogorov- Smirnov- test for consistency of two distributions applied 
to each surrogate distributions and the distribution of new realizations of the pro- 
cess rejected the null hypothesis in every case at the 99% level of confidence. 

The choosen test statistic is indeed able to detect nonlinearities since its 
expectation value for a Gaussian white noise process with unit variance is 
= 1.12... and for "gaussified" data of the logistic map, r = 4, which shows 
also a flat spectrum, it is 1.09... . 

3 Linear Processes 

The misallocation of /i SU rr and the underestimation of the true standard deviation 
by cr surr that appear in fig. [I] can be explained in terms of the theory of linear 
processes. 
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3.1 Time Domain 

In order to model the fluctuations in the periodicity of sunspot data Yule 
introduced linear stochastic systems: 
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x(t) =a 1 x(t-l) + a 2 x(t-2) + e(t), e(t) ~ JV(0, a 2 ) . (2) 

These are now called autoregressive (AR) processes can be generalized to au- 
toregressive moving average (ARMA) processes by including not only past values 
of the observations but also past values of the noise. The general ARMA[p,q] 
process reads: 

v 1 

x(t) = Oix{t -z)+£ M* ~ J) + <*) ( 3 ) 
i=i j=i 

State space models are a further generalization. They consist of a vector 
valued AR[1] process and a linear observation function with observation noise: 



x{t) = Ax(t - 1) + e(t) 

e(t)~AT(0,Q) (4) 
y(t) = Cx(t)+r)(t) 

where Q is the covariance matrix of the dynamical noise, and R the variance 
of the observation noise. State space models model observation noise explicitly. 
For AR and ARMA models, observation noise causes the estimated parameters 
to be biased towards zero. 

Once the type of model has been chosen, one must select the model order. 
Generally, one fits a sequence of higher and higher order models and tests the pre- 
diction errors for consistency with white noise, e.g. by a Kolmogorov - Smirnov 



- test on the periodogram |L3| 
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If a particular class and order of model is selected, when the parameters 
of the model are estimated using data from a random process, the estimated 
parameters are random variables. By restricting the number of parameters to be 
fit, one insures that as the number of observed data increases, the variance of the 
estimates goes to zero. In the next subsection we emphasize that periodograms 
do not have this property. 

3.2 Frequency Domain 

We now examine the spectrum of linear processes and the relationship between 
the spectrum and the periodogram. The autocovariance function R(£) is given 
by 

R(r) = (x(t)x(t + r)) , 
and the spectrum is defined by: 



oo 



S{u>) = £ R(t)e-^ = (|^=$>(t) 



t=— oo \ V 1 v t 

2^ 



\x(oj)\ 2 ) = (Per(u)) 
The spectrum of a state space model is given by : 

S{u) = C(l- Ae^)~ 1 Q(l- Ae^)~ 1T C T + R . (5) 

Thus the spectrum S(u) of this process is a smooth function of uj. Spectra of AR 
or ARMA processes are special cases of eq. (§]). 

We now turn to the distribution of the periodogram. For details see [|1^, |15 



we only summarize the results. Asymptotically, the Fourier transform of a linear 
stochastic process 

1 i 
x ^ = It? 51 x ( f ) cos{ut) + -= x{t) sin(wt) (6) 
viv ( ViV t 



is a complex Gaussian random variable 

x(co) = JV(0, ±S(u)) + iST(0, -%;)) (7) 

where S(u) is independent of N. Furthermore, Fourier components are asymp- 
totically uncorrelated for different frequencies. In terms of modulus and phase 
instead of real and imaginary part, this means that not only is the phase a random 
variable but so is the modulus. 

The distribution of the periodogram 

PerM = £ x(t) coa{ut)\ + ^-L £ x{t) sm{ut)^ (8) 

follows from eq. (0) to be a x 2 distritubion with two degrees of freedom \\- 

PerM ~ \s{yj) X \ (9) 

again independent of N. 

Since the mean and variance of \\ are 2 and 4 respectively, the standard 
deviation of the periodogram is equal to its mean, i.e. 

Per(cu) = S(u)±S(u). 

Thus, the periodogram fluctuates wildly regardless of the number of data points 
used to calculate it. It is an unbiased but not a consistent estimator of the 
spectrum since its variance does not decrease with N. This is true not only for 
linear stochastic processes but also for nonlinear stochastic and even for chaotic 
processes if only the autocovariance function decays fast enough. 



4 Consequences for the Surrogate Data Method 

The characteristics of the spectrum and periodogram described in the previous 
section lead to the following two points concerning the FT algorithm: 
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1. The FT algorithm does not use its estimate of the spectrum correctly. Both 
phases and amplitudes should be randomized. 

2. A better estimator of the spectrum than the periodogram is required. 
Below we elaborate on these points. 

4.1 Randomizing Phases and Amplitudes 

Eq. (0) means that at each frequency the Fourier transform of a time series from 
a linear stochastic system is a two dimensional Gaussian random variable. The 
FT algorithm, by fixing the modulus and only randomizing the phases, restricts 
the new realization to a set of measure zero in the set of all possible realizations of 
the linear stochastic system under consideration. It interprets the periodogram of 
a stochastic process as the spectrum of N/2 linear deterministic oscillators with 
fixed amplitudes. Thus, instead of testing for linear stochastic systems, the FT 
algorithm by construction tests for the time - discrete version of the following 
deterministic process y(t): 

Xi (t) = -^ 2 Xi (t), x 2(0) + (^^j = Per(LUi) 

N/2 

y(t) = $>(t) . (10) 

i=i 

Note, that this process depends on the length N of the time series. 

If the true spectrum of the process were known the following algorithm would 
yield the correct distribution of surrogates. The algorithm is based on eq. (|^) that 
connects the spectrum S(u) with the variance of the complex random variable 
x(u). 

• For each frequency uji = i = 1, y, draw two Gaussian distributed 
random numbers, multiply them by ^J^S(u>i) and take the result as the real 
and imaginary parts of the Fourier transform of the desired data. 
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• For reasons of symmetry the last frequency is always real for even number 
of data points. Thus, in this case, only one Gaussian distributed random 
number has to be drawn. 

• To obtain a real valued time series, chose the Fourier components for the 
negative frequencies as the complex conjugates of the components at posi- 
tive frequencies, i.e., x{— lo) = x*(uj). 

• Do an inverse Fourier transformation of x(oj) to get the surrogate time 
series. 

Please 

Fig. H shows the results of a simulation study analogously to fig. [I]. Again, the locate fig.2 
bottom line displays the cumulative distribution of the feature from eq. (|I|) for near here 
new realizations of the white noise process. The upper lines show the distributions 
based on the proposed algorithm using the true flat spectrum of white noise. 



4.2 Estimating the Spectrum 

The critical issue for the procedure described in the previous section is spectral 
estimation. There are two general approaches commonly used for this purpose: 
Fit an AR or ARMA model to the data and calculate the spectrum from the 
resulting parameters or smooth the periodogram. For more than the brief dis- 



cussion given here, see p~3], [15] 



Spectral estimation by fitting AR or ARMA models 

This method cannot be recommended for two reasons. On the one hand, as dis- 
cussed in section |3], in presence of observation noise these models do not give 
correct results even if the underlying dynamic is linear. On the other hand, if the 
process is nonlinear they are, in general, not consistent estimators of the spec- 
trum. Of course, the parameters of linear models determine the spectrum. But 
not every spectrum, e.g. of a chaotic process, is well described by a linear model, 
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since the prediction errors will not be consistent with white noise. Increasing the 
order of the model does not help, because the "spectrum" calculated from the 
parameters will begin to interpolate the periodogram. For the same reason, sur- 
rogate data generated by realizations from a fitted ARMA model test, in general, 
for some linear model, but not for that specified by the spectrum. 

Estimation by smoothing the periodogram 

Estimating the spectrum by smoothing the periodogram rests on three results 
of the theory of linear processes: the expectation value of the periodogram is 
the spectrum, the spectrum is a continuous function, and the random variables 
Per(a>) and Per(u/) are asymptotically uncorrelated. The idea is to smooth the 
periodogram to reduce the variance (or equivalently split the time series into 
segments and average the periodograms). Denoting the number of data points 
by N, the number of frequency bins included in the smoothing by n s , one can 
show that in the limit of N — > oo and n s — > oo, with n s constrained to increase 
slowly enough to insure % — > 0, this procedure leads to a consistent estimator 
for the spectrum. 

For finite data sequences, the smoothing will introduce a bias and there is a 
trade-off between bias and variance of the estimator which is discussed at length 



in 13, 15 . 



Surrogates based on an estimated spectrum 

The bias and the variance of the spectral estimate lead to a misspecification of 

the surrogate data distribution that vanishes asymptotically. Please 

Fig. H is analogous to fig. [I]. It shows the results of a simulation study using locate fig. 3 
a moving average kernel of width 2000 frequency bins to estimate the spectrum near here 
and the algorithm of section [O to generate the surrogate data. The Kolmogorov 



Smirnov - test for consistency of two distributions does not reject the null 
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hypothesis in 7 of the 9 cases shown at the 99% level of confidence, demonstrating 
the convergence. 



5 Conclusion 

The appeal of the general idea of surrogate data testing is clarified by a com- 
parison with the tests for linearity proposed in the stochastic systems literature. 
These tests rely on asymptotic results for linear processes, mainly concerning 
third order properties. There are two classes of these tests. The first class is 
based on fitting an AR model to the data and investigating third order prop- 
erties of the prediction errors. Examples of these parametric tests are given by 



1C, 17]. The other class of tests is based on the asympotitic property that the 



three-point-correlations: 

c(n,T 2 ) = 5>(t) x(t + n) x(t + r 2 ) (li) 

t 

are zero for any linear process. For examples of these nonparametric tests see 
[ ITR [HJ. In order to perform the test, c(t\,T2) is Fourier transformed and the 
normalized bispectrum is estimated. For linear Gaussian processes it should be 
zero and the consistency with zero can be tested. Both classes of tests test data 
of a given process by a property that is fulfilled by all possible linear processes. 

The intuition behind the surrogate data method is that only that linear pro- 
cess that, jugded by its spectral properties, could have generated the data needs 
to be tested. 

The sampling properties of a test statistic of a linear process depend on the 
the details of the process and the statistic. In particular, the variance will often 
be larger for processes with sharper peaks in the spectrum or equivalently longer 
coherence times. For example, the variance of the sample mean x for an iid 
Gaussian process A/"(0, a 2 ) is but if the process has correlations, the variance 
of the sample mean will decrease more slowly as a function of N. By construction, 
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the surrogate data method produces the process specific sampling properties. 

The power of any test, i.e. the ability to detect a violation of the null hypoth- 
esis, depends on the statistic. The bispectral tests considering the three-point- 
correlations are sensitive to quadratic nonlinearities but can be fooled completely 
by processes showing only cubic nonlinearities like the Duffing - oscillator. The 
general scheme of surrogate data testing can use any statistic, and the power of 
the test will depend on the statistic chosen. Thus one can detect a larger class 
of violations of the null hypothesis. 

We showed by an example that the FT algorithm proposed by Theiler et al. 
[|IJ might specify the distribution of the test statistic incorrectly, independent of 
the number of data points used. Based on the theory of linear stochastic systems, 
we proposed an alternative that involves spectral estimation. Therefore, testing 
for linearity by the idea of is, in general, also a procedure that is 

only valid asymptotically. For a finite number of data, simulation studies should 
be performed to evaluate the estimation error. 
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Figure captions 



Figure 1: Cumulative distributions based on 100 realizations for the test statistic 
/( x ) = aFTI 2 \ x (t + 1) — x(t)\. The line at the bottom displays the distribution 
for new realizations of the process. The lines above show the distributions for 
the FT algorithm based on different realizations of the process. The values of the 
test statistic /(x) for the realizations which were surrogated are marked. The 
most extreme values of the cumulative distributions are not plotted for clearness. 

Figure 2: Cumulative distributions based on 100 realizations for the test statistic 
/( x ) = aFZT S \ x (t + 1) — x(i)\. The line at the bottom displays the distribution 
for new realizations of the process. The lines above show the distributions based 
on the algorithm of section 4.1 using the true spectrum. The most extreme values 
of the cumulative distributions are not plotted for clearness. 

Figure 3: Cumulative distributions based on 100 realizations for the test statistic 
/(x) = j^—^ J2 \ x (t + 1) — x(t)\. The line at the bottom displays the distribution 
for new realizations of the process. The lines above show the distributions based 
on the algorithm of section 4.1. The values of the test statistic /(x) for the 
realizations which were surrogated are marked. The most extreme values of the 
cumulative distributions are not plotted for clearness. 
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